A Consensus Algorithm for Approximate String Matching

نویسندگان

  • Miguel Rubio
  • Alfonso Alba
  • Martín Mendez
  • Edgar Arce-Santana
  • Margarita Rodriguez-Kessler
چکیده

Approximate string matching (ASM) is a well-known computational problem with important applications in database searching, plagiarism detection, spelling correction, and bioinformatics. The two main issues with most ASM algorithms are (1) computational complexity, and (2) low specificity due to a large amount of false positives being reported. In this paper, a very efficient ASM method is proposed, along with a post -processing stage designed to significantly reduce the amount of false positives. Results with random strings show that the proposed method is capable of performing a search within a large (1 Mb) string in about 100 ms, with a sensitivity and specificity of nearly 100%. © 2012 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Global Science and Technology Forum Pte Ltd

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Simulation of NFA in Approximate String and Sequence Matching

We present detailed description of simulation of nondeterministic nite automata (NFA) for approximate string matching. This simulation uses bit parallelism and used algorithm is called Shift-Or algorithm. Using knowledge of simulation of NFA by Shift-Or algorithm we design modi cation of ShiftOr algorithm for approximate string matching using generalized Levenshtein distance and modi cation for...

متن کامل

Average-Optimal Multiple Approximate String Matching

We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) singlepattern approximate string matching algorithm of Chang and Marr. Our algorithm inherits the optimality and is also competitive in practice. We present a second algorithm that is linear time and handles higher difference ratios. We show experimentally that our algorithms a...

متن کامل

Data structures and algorithms for approximate string matching

This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching.

متن کامل

A Fast Algorithm for Approximate String Matching on Gene Sequences

Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013